-
Notifications
You must be signed in to change notification settings - Fork 14.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AIRFLOW-XXX] GSoD: Adding Task re-run documentation #6295
Conversation
docs/index.rst
Outdated
@@ -83,7 +83,11 @@ Content | |||
ui | |||
concepts | |||
scheduler | |||
<<<<<<< HEAD |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Somethings is broken here.
Codecov Report
@@ Coverage Diff @@
## master #6295 +/- ##
==========================================
- Coverage 80.41% 80.34% -0.08%
==========================================
Files 612 616 +4
Lines 35473 35733 +260
==========================================
+ Hits 28525 28709 +184
- Misses 6948 7024 +76
Continue to review full report at Codecov.
|
docs/dag-run.rst
Outdated
An Airflow DAG with a ``start_date``, possibly an ``end_date``, and a ``schedule_interval`` defines a | ||
series of intervals which the scheduler turn into individual DAG Runs and execute. A key capability | ||
of Airflow is that these DAG Runs are atomic and idempotent items. The scheduler, by default, will | ||
kick off a DAG Run for any interval that has not been run (or has been cleared). This concept is called Catchup. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is mostly true, but slightly misleading in a few edge cases. I'm not sure if this is worth mentioning here or not.
Catchup, and the scheduler in general will not "fill in gaps" - it will only look forward from the most recent dag run.
For example:
- I have a daily dag with catchup=False running. This is "d0"
- I pause that dag for 3 days
- I then start it again.
At this point we have dagruns for d0, d4, d5, ...
- I edit the dag to set catchup=True
The scheduler will not go and "fill in" d1, d2, d3.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated with changes.
docs/dag-run.rst
Outdated
default_args = { | ||
'owner': 'Airflow', | ||
'depends_on_past': False, | ||
'start_date': datetime(2015, 12, 1), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this is what we do in most cases in our docs, but can you move start_date out of default_args
and in to just a DAG arg please?
docs/dag-run.rst
Outdated
|
||
**Note**: When clearing a set of tasks’ state in hope of getting them to re-run, it is important | ||
to keep in mind the DAG Run’s state too as it defines whether the scheduler should look | ||
into triggering tasks for that run. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that if you clear tasks it will also set the dagrun state to running? But I'm not certain of that.
Looking good overall though! |
@ashb Preview Links for the PR: |
@@ -84,6 +84,7 @@ Content | |||
concepts | |||
scheduler | |||
executor/index | |||
dag-run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this probably makes more sense as a page under Concepts -- it doesn't fit with the rest of the top level items we have.
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mik-laj and I were having a discussion on the same perspective that the concepts page needs to be broken down and some subpages need to move inside it. Since that would require changes in many pages, can we have that as part of another PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can have a subpage like https://airflow.readthedocs.io/en/stable/howto/index.html and have different topics there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@KKcorps What did we decide about this? Merge this PR as is then you split up concepts in to multiple pages afterwards?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, splitting up the page is a big effort. I think we should merge this and that split it up later.
@@ -32,161 +30,10 @@ Airflow production environment. To kick it off, all you need to do is | |||
execute ``airflow scheduler``. It will use the configuration specified in | |||
``airflow.cfg``. | |||
|
|||
Note that if you run a DAG on a ``schedule_interval`` of one day, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was quite an important point. I feel we should direct people to the new dag run page (where ever we decide it lives) from here too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't done yet. A "see also" link is not strong enough.
docs/dag-run.rst
Outdated
Your DAG will be instantiated for each schedule along with a corresponding | ||
DAG Run entry in backend. | ||
|
||
**Note**: If you run a DAG on a schedule_interval of one day, the run stamped 2020-01-01 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this and the next paragraph sould be in a .. note::
block to make them stand out more. What does that look like?
This would be very helpful as a document to share with folks onboarding to Airflow, even if it's imperfect, I think it would be great to get this into the codebase. |
Co-Authored-By: Ash Berlin-Taylor <[email protected]>
Co-Authored-By: Ash Berlin-Taylor <[email protected]>
docs/scheduler.rst
Outdated
|
||
To start a scheduler, simply run the command: | ||
|
||
.. code:: bash | ||
|
||
airflow scheduler | ||
|
||
You can start executing a DAG once your scheduler has started running successfully. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"executing" a dag doesn't quite make sense. Either we mean enable it and let the scheduler excute it, or we mean "trigger a manual run".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll change it to 'your dags will start executing'
docs/dag-run.rst
Outdated
Re-run DAG | ||
'''''''''' | ||
There can be cases where you will want to execute your DAG again. One such case is when the scheduled | ||
DAG run fails. Another can be the scheduled DAG run wasn't executed due to low resources or the DAG being turned off. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Low resources shouldn't stop a dag run from being created, so lets remove that bit.
Co-Authored-By: Ash Berlin-Taylor <[email protected]>
docs/dag-run.rst
Outdated
|
||
An Airflow DAG with a ``start_date``, possibly an ``end_date``, and a ``schedule_interval`` defines a | ||
series of intervals which the scheduler turn into individual DAG Runs and execute. A key capability | ||
of Airflow is that these DAG Runs are atomic and idempotent items. The scheduler, by default, will |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
of Airflow is that these DAG Runs are atomic and idempotent items. The scheduler, by default, will | |
of Airflow is that these DAG Runs should be atomic and idempotent items. The scheduler, by default, will |
(should be, as Airflow doesn't do this magically, but it depends on how the tasks are written.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. then I guess I should remove the line only as it doesn't make much sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or I can add The runs should be atomic and idempotent for the catchup to function as expected.
docs/dag-run.rst
Outdated
Backfill | ||
--------- | ||
There can be the case when you may want to run the dag for a specified historical period e.g. a data pipeline | ||
which dumps data in a DFS every day and another pipeline which requires last 1 month of data in DFS. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this is the best example of backfill.
Backfill's primary aim is for running for date periods before the start date of the task. (Otherwise the scheduler would see the date is before start_date and not do anything)
docs/dag-run.rst
Outdated
|
||
airflow backfill -s START_DATE -e END_DATE dag_id | ||
|
||
The above command will re-run all the instances of the dag_id for all the intervals within the start date and end date. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should link to https://airflow.apache.org/cli.html#backfill
docs/dag-run.rst
Outdated
|
||
Click on the failed task in the Tree or Graph views and then click on **Clear**. | ||
`` | ||
failed to ``None`` and the executor will re-run it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Odd line break here.
|
||
.. code:: bash | ||
|
||
airflow tasks clear -h |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Link to cli docs here too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I remove the example and just have the CLI doc link or put CLI docs link below the example?
docs/dag-run.rst
Outdated
|
||
There are multiple options you can select to re-run - | ||
|
||
* Past - All the instances of the task in the runs before the current DAG's execution date |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Past - All the instances of the task in the runs before the current DAG's execution date | |
* **Past** - All the instances of the task in the runs before the current DAG's execution date |
We should probably bold all the options
Co-Authored-By: Kaxil Naik <[email protected]>
(cherry picked from commit ac2d0be)
(cherry picked from commit ac2d0be)
(cherry picked from commit ac2d0be)
Make sure you have checked all steps below.
Jira
Description
Tests
Commits
Documentation